Skip to content

Deduplicate walltime perf symbols, unwind data and debug info across pids#240

Open
GuillaumeLagrange wants to merge 4 commits intorunner-shared-deduplication-changesfrom
cod-2138-deduplicate-perf-maps-and-unwind_data
Open

Deduplicate walltime perf symbols, unwind data and debug info across pids#240
GuillaumeLagrange wants to merge 4 commits intorunner-shared-deduplication-changesfrom
cod-2138-deduplicate-perf-maps-and-unwind_data

Conversation

@GuillaumeLagrange
Copy link
Contributor

@GuillaumeLagrange GuillaumeLagrange commented Feb 11, 2026

Merge/deployment

  1. Merge feat(runner-shared): add new fields to perf metadata #246 (on top of which this PR is based)
  2. Merge the backend PR refering to the changed perf metadata changes
  3. Merge this pr and release runner once the changes in the backend are in prod

Content

Since the real deduplicator source of truth is actually the path of the elf file on disk, I settled on

{path_index}__{file_name}.unwind_data
{path_index}__{file_name}.symbols.map

it allows for the filename to be determined by the actual path, without having to either nest all the files, or have to sanitize paths to file name and end up in awkward file-name length limitations

Here's what it looks like on a 1000 iterations report of one simple program

$ ls
0__exec-harness.symbols.map          3__ld-linux-x86-64.so.2.unwind_data  7__nix-ld.symbols.map         perf.metadata
0__exec-harness.unwind_data          4__ld-linux-x86-64.so.2.symbols.map  7__nix-ld.unwind_data         perf.pipedata
1__libm.so.6.symbols.map             4__ld-linux-x86-64.so.2.unwind_data  8__libc.so.6.symbols.map      results
1__libm.so.6.unwind_data             5__libgcc_s.so.1.symbols.map         8__libc.so.6.unwind_data      runner.log
2__graph-benchmark.symbols.map       5__libgcc_s.so.1.unwind_data         9__libgcc_s.so.1.symbols.map
2__graph-benchmark.unwind_data       6__libc.so.6.symbols.map             9__libgcc_s.so.1.unwind_data
3__ld-linux-x86-64.so.2.symbols.map  6__libc.so.6.unwind_data             ExecutionTimestamps.msgpack


@GuillaumeLagrange GuillaumeLagrange changed the title Cod 2138 deduplicate perf maps and unwind data Deduplicate walltime perf symbols, unwind data and debug info across pids Feb 11, 2026
@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch from 99e0ea4 to 1a32faa Compare February 11, 2026 14:45
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 11, 2026

Merging this PR will not alter performance

✅ 4 untouched benchmarks


Comparing cod-2138-deduplicate-perf-maps-and-unwind_data (642e4fd) with runner-shared-deduplication-changes (54f94be)

Open in CodSpeed

@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch 2 times, most recently from 15316f4 to 9305a03 Compare February 12, 2026 03:37
@GuillaumeLagrange GuillaumeLagrange marked this pull request as ready for review February 12, 2026 08:49
@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch from a3617af to 7f65690 Compare February 12, 2026 11:14
Copy link
Member

@not-matthias not-matthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the struct naming can be clarified a bit, I was struggling quite a lot to understand how they map together. Maybe also using custom types for the key that's used in the hashmaps.

The overall logic looks good! Next round should be quick

@GuillaumeLagrange GuillaumeLagrange removed the request for review from art049 February 13, 2026 11:04
@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch 5 times, most recently from 192d2ce to 14cb24f Compare February 13, 2026 16:45
@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch from 14cb24f to 9a7f9bc Compare February 13, 2026 16:55
Copy link
Member

@not-matthias not-matthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Just a few minor stylistic comments.

@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch 2 times, most recently from e43f420 to f308463 Compare February 16, 2026 11:36
@GuillaumeLagrange GuillaumeLagrange changed the base branch from main to runner-shared-deduplication-changes February 16, 2026 11:36
@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch 2 times, most recently from 31d5b35 to 642e4fd Compare February 16, 2026 13:19
This fixes a regression introduced in 8b37208.
We now filter pids using `bench_pids`, except for `exec-harness` integrations,
where we take all pids.
- Store pid-agnostic data in a file or json map under a mapped
`path_key` for each elf
- For each pid, store pid specific data, mostly the computed
load_bias from where each module was loaded into memory at
runtime, alongside a key to retrieve the pid-agnostic data

This way, we only write to disk relevant parts of the information.
Also add a rebuild trigger to make it easier to run
GITHUB_ACTIONS=1 cargo test` locally.

We could have a better trigger, but this will do for now.
@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-2138-deduplicate-perf-maps-and-unwind_data branch from 642e4fd to 82a55b9 Compare February 16, 2026 13:29
@GuillaumeLagrange GuillaumeLagrange force-pushed the runner-shared-deduplication-changes branch from 54f94be to 5bb49e6 Compare February 16, 2026 13:29
Copy link
Member

@art049 art049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the naming thing TBD
Otherwise, lgtm! this needs extensive testing before release though

Comment on lines +63 to +72
/// Register a path in the map if absent, assigning a new unique key.
/// Returns the assigned key.
fn get_or_insert_key(path_to_key: &mut HashMap<PathBuf, String>, path: &Path) -> String {
if let Some(key) = path_to_key.get(path) {
return key.clone();
}
let key = naming::indexed_semantic_key(path_to_key.len(), path);
path_to_key.insert(path.to_owned(), key.clone());
key
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we loose a bit of info here. IIRC, we ended up saying that hashing the path was a better option no?
I know we iterated quite a lot but here we loose all info about similar prefixes (when in the same path) even if it doesn't happen often

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants